Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB
نویسندگان
چکیده
Random Forests (RF) is a successful classifier exhibiting performance comparable to Adaboost, but is more robust. The exploitation of two sources of randomness, random inputs (bagging) and random features, make RF accurate classifiers in several domains. We hypothesize that methods other than classification or regression trees could also benefit from injecting randomness. This paper generalizes the RF framework to other multiclass classification algorithms like the well-established MultiNomial Logit (MNL) and Naive Bayes (NB). We propose Random MNL (RMNL) as a new bagged classifier combining a forest of MNLs estimated with randomly selected features. Analogously, we introduce Random Naive Bayes (RNB). We benchmark the predictive performance of RF, RMNL and RNB against state-ofthe-art SVM classifiers. RF, RMNL and RNB outperform SVM. Moreover, generalizing RF seems promising as reflected by the improved predictive performance of RMNL.
منابع مشابه
Random Forests for multiclass classification: Random MultiNomial Logit
Several supervised learning algorithms are suited to classify instances into a multiclass value space. MultiNomial Logit (MNL) is recognized as a robust classifier and is commonly applied within the CRM (Customer Relationship Management) domain. Unfortunately, to date, it is unable to handle huge feature spaces typical of CRM applications. Hence, the analyst is forced to immerse himself into fe...
متن کاملRandom Forests for Big Data
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based o...
متن کاملMixing Weak Learners In Semantic Parsing
We apply a novel variant of Random Forests (Breiman, 2001) to the shallow semantic parsing problem and show extremely promising results. The final system has a semantic role classification accuracy of 88.3% using PropBank gold-standard parses. These results are better than all others published except those of the Support Vector Machine (SVM) approach implemented by Pradhan et al. (2003) and Ran...
متن کاملOne Class Splitting Criteria for Random Forests
Random Forests (RFs) are strong machine learning tools for classification and regression. However, they remain supervised algorithms, and no extension of RFs to the one-class setting has been proposed, except for techniques based on second-class sampling. This work fills this gap by proposing a natural methodology to extend standard splitting criteria to the one-class setting, structurally gene...
متن کاملEnsembles of Binary SVM Decision Trees
Ensemble methods are able to improve the predictive performance of many base classifiers. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to Binary SVM Decision Tree (SVM-BDT). Binary SVM Decision Tree is a tree based architecture that utilizes support vector machines for solving multiclass problems. It takes advantage of both the efficien...
متن کامل